Adapting the JIRS Passage Retrieval System to the Arabic Language

نویسندگان

  • Yassine Benajiba
  • Paolo Rosso
  • José Manuel Gómez Soriano
چکیده

The need of having a Passage Retrieval (PR) system for Arabic texts is due essentially to our aim to build an Arabic Question Answering (QA) system in our research team. We have chosen working on the PR system to be our first step to pursue our aim because being the core component and its quality will affect directly the performance of the QA system. JAVA Information Retrieval System (JIRS) is a PR QA-oriented system, multi-platform, open source and free to use. JIRS uses an n-gram model and it is language-independent. It separates language configuration files to make easier its adaptation to any language. In this paper, we report the different challenges when adapting the JIRS to the Arabic language.In order to evaluate JIRS on Arabic, we had to develop an Arabic test-bed using the multilingual CLEF QA one as guideline. We also report the results obtained in our experiments where we retrieved Arabic passages with JIRS first without any text preprocessing and second performing a prior light-stemming on the documents of the test-bed. The preliminary results show that it is possible to obtain a first Arabic passage retrieval system adapting JIRS on pre-processed text with a light-stemmer.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Re-ranking of Yahoo Snippets with the JIRS Passage Retrieval System

Passage Retrieval (PR) systems are used as first step of the actual Question Answering (QA) systems. Usually, PR systems are traditional information retrieval systems which are not oriented to the specific problem of QA. In fact, these systems only search for the question keywords. JIRS Distance Density n-gram system is a QA-oriented PR system which has given good results in QA tasks when this ...

متن کامل

TALP at GeoCLEF-2006: Experiments Using JIRS and Lucene with the ADL Feature Type Thesaurus

This paper describes our experiments in Geographical Information Retrieval (GIR) in the context of our participation in the GeoCLEF 2006 Monolingual English task. The TALPGeoIR system follows a similar architecture of the GeoTALP-IR system presented at GeoCLEF 2005 [2] with some changes in the Retrieval modes and the Geographical Knowledge Base. The system has four phases performed sequentially...

متن کامل

N -Gram vs. Keyword-Based Passage Retrieval for Question Answering

In this paper we describe the participation of the Universidad Politécnica of Valencia to the 2006 edition, which was focused on the comparison between a Passage Retrieval engine (JIRS) specifically aimed to the Question Answering task and a standard, general use search engine such as Lucene. JIRS is based on n-grams, Lucene on keywords. We participated in three monolingual tasks: Spanish, Ital...

متن کامل

Boosting Passage Retrieval through Reuse in Question Answering

Question Answering (QA) is an emerging important field in Information Retrieval. In a QA system the archive of previous questions asked from the system makes a collection full of useful factual nuggets. This paper makes an initial attempt to investigate the reuse of facts contained in the archive of previous questions to help and gain performance in answering future related factoid questions. I...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007